Expanding Chinese Sentiment Dictionaries from Large Scale Unlabeled Corpus
نویسندگان
چکیده
Unsupervised sentiment classification usually needs a user defined sentiment dictionary. However, the existing dictionaries in Chinese are insufficient, for example, the intersection rate of two popular Chinese sentiment dictionaries HowNet and NTUSD is less than 10%. In this paper, we present a method to help expand the dictionaries with more sentiment words by ranking them through link analysis based on a word graph constructed from a large unlabeled corpus. Meanwhile, our method could compute a sentiment polarity strength for each word in the new dictionaries. Manual evaluation has shown that our method has a high precision to expand the dictionaries. Experiments for sentiment classification have shown that the new dictionaries with the polarity strength for each word given by our algorithm are effective to improve the performance. As a byproduct, our algorithm could also discover the errors existing in current dictionaries.
منابع مشابه
CT-SPA: Text sentiment polarity prediction model using semi-automatically expanded sentiment lexicon
In this study, an automatic classification method based on the sentiment polarity of text is proposed. This method uses two sentiment dictionaries from different sources: the Chinese sentiment dictionary CSWN that integrates Chinese WordNet with SentiWordNet, and the sentiment dictionary obtained from a training corpus labeled with sentiment polarities. In this study, the sentiment polarity of ...
متن کاملCo-Training for Cross-Lingual Sentiment Classification
The lack of Chinese sentiment corpora limits the research progress on Chinese sentiment classification. However, there are many freely available English sentiment corpora on the Web. This paper focuses on the problem of cross-lingual sentiment classification, which leverages an available English corpus for Chinese sentiment classification by using the English corpus as training data. Machine tr...
متن کاملGenerate Adjective Sentiment Dictionary for Social Media Sentiment Analysis Using Constrained Nonnegative Matrix Factorization
Although sentiment analysis has attracted a lot of research, little work has been done on social media data compared to product and movie reviews. This is due to the low accuracy that results from the more informal writing seen in social media data. Currently, most of sentiment analysis tools on social media choose the lexicon-based approach instead of the machine learning approach because the ...
متن کاملBuild Chinese Emotion Lexicons Using A Graph-based Algorithm and Multiple Resources
For sentiment analysis, lexicons play an important role in many related tasks. In this paper, aiming to build Chinese emotion lexicons for public use, we adopted a graph-based algorithm which ranks words according to a few seed emotion words. The ranking algorithm exploits the similarity between words, and uses multiple similarity metrics which can be derived from dictionaries, unlabeled corpor...
متن کاملBilingual Co-Training for Sentiment Classification of Chinese Product Reviews
The lack of reliable Chinese sentiment resources limits research progress on Chinese sentiment classification. However, there are many freely available English sentiment resources on the Web. This article focuses on the problem of cross-lingual sentiment classification, which leverages only available English resources for Chinese sentiment classification. We first investigate several basic meth...
متن کامل